Convergent Fitted Value Iteration with Linear Function Approximation

نویسنده

  • Dan Lizotte
چکیده

Fitted value iteration (FVI) with ordinary least squares regression is known to diverge. We present a new method, “Expansion-Constrained Ordinary Least Squares” (ECOLS), that produces a linear approximation but also guarantees convergence when used with FVI. To ensure convergence, we constrain the least squares regression operator to be a non-expansion in the∞-norm. We show that the space of function approximators that satisfy this constraint is more rich than the space of “averagers,” we prove a minimax property of the ECOLS residual error, and we give an efficient algorithm for computing the coefficients of ECOLS based on constraint generation. We illustrate the algorithmic convergence of FVI with ECOLS in a suite of experiments, and discuss its properties.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Convergent Form of Approximate Policy Iteration

We study a new, model-free form of approximate policy iteration which uses Sarsa updates with linear state-action value function approximation for policy evaluation, and a “policy improvement operator” to generate a new policy based on the learned state-action values. We prove that if the policy improvement operator produces -soft policies and is Lipschitz continuous in the action values, with ...

متن کامل

Bias Correction and Confidence Intervals for Fitted Q-iteration

We consider finite-horizon fitted Q-iteration with linear function approximation to learn a policy from a training set of trajectories. We show that fitted Q-iteration can give biased estimates and invalid confidence intervals for the parameters that feature in the policy. We propose a regularized estimator called soft-threshold estimator, derive it as an approximate empirical Bayes estimator, ...

متن کامل

An Effective Method for Seventh-Order Boundary Value Problems

In this paper, we used the Optimal Homotopy Asymptotic Method (OHAM) to find the approximate solution of seventh order linear and nonlinear boundary value problems. The approximate solution using OHAM is compared with Variational Iteration Method (VIM) and exact solutions, an excellent agreement has been observed. The approximate solution of the equations is obtained in terms of convergent seri...

متن کامل

Dhage iteration method for PBVPs of nonlinear first order hybrid integro-differential equations

In this paper, author proves the algorithms for the existence as well as the approximation of solutions to a couple of periodic boundary value problems of nonlinear first order ordinary integro-differential equations using operator theoretic techniques in a partially ordered metric space. The main results rely on the Dhage iteration method embodied in the recent hybrid fixed point theorems of D...

متن کامل

Variance Reduced Value Iteration and Faster Algorithms for Solving Markov Decision Processes

In this paper we provide faster algorithms for approximately solving discounted Markov Decision Processes in multiple parameter regimes. Given a discounted Markov Decision Process (DMDP) with |S| states, |A| actions, discount factor γ ∈ (0, 1), and rewards in the range [−M,M ], we show how to compute an ǫ-optimal policy, with probability 1− δ in time Õ (( |S||A|+ |S||A| (1− γ) ) log ( M ǫ ) log...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011